A real-time speech-driven talking head using active appearance models
نویسندگان
چکیده
In this paper we describe a real-time speech-driven method for synthesising realistic video sequences of a subject enunciating arbitrary phrases. In an offline training phase an active appearance model (AAM) is constructed from hand-labelled images and is used to encode the face of a subject reciting a few training sentences. Canonical correlation analysis (CCA) coupled with linear regression is then used to model the relationship between auditory and visual features, which is later used to predict visual features from the auditory features for novel utterances. We present results from experiments conducted: 1) to determine the suitability of several auditory features for use in an AAM-based speech-driven talking head, 2) to determine the effect of the size of the training set on the correlation between the auditory and visual features, 3) to determine the influence of context on the degree of correlation, and 4) to determine the appropriate window size from which the auditory features should be calculated. This approach shows promise and a longer term goal is to develop a fully expressive, three-dimensional talking head.
منابع مشابه
Modelling 'Talking Head' Behaviour
We describe a generative model of ‘talking head’ facial behaviour, intended for use in both video synthesis and model-based interpretation. The model is learnt, without supervision, from talking head video, parameterised by tracking with an Active Appearance Model (AAM). We present a integrated probabilistic framework for capturing both the short-term visual dynamics and longer-term behavioural...
متن کامل"Mask-bot": A life-size robot head using talking head animation for human-robot communication
In this paper, we introduce our life-size talking head robotic system, “Mask-bot”, developed as a platform to support and accelerate human-robot communication research. The “Mask-bot” hardware consists of a semi-transparent plain mask, a portable LED projector with a fish-eye conversion lens mounted behind the mask, a pan-tilt unit and a mounting base. The hardware is driven by a software anima...
متن کاملVideo Realistic Talking Heads Using Hierarchical Non-linear Speech-appearance Models
In this paper we present an audio driven system capable of videorealistic synthesis of a speaker uttering novel phrases. The audio input signal requires no phonetic labelling and is speaker independent. The system requires only a small training set of video and produces fully co-articulated realistic facial synthesis. Natural mouth and face dynamics are learned in training to allow new facial p...
متن کاملPrevis: a person-specific realistic virtual speaker
This paper describes a 2D realistic talking face. The facial appearance model is constructed with a parameterised 2D sample based model. This representation supports moderated head movements, facial gestures and emotional expressions. Two main contributions for talking heads applications are proposed. First, the image of the lips is synthesized by means of shape and texture information. Secondl...
متن کاملLifelike Talking Faces for Interactive Services
Lifelike talking faces for interactive services are an exciting new modality for man–machine interactions. Recent developments in speech synthesis and computer animation enable the real-time synthesis of faces that look and behave like real people, opening opportunities to make interactions with computers more like face-to-face conversations. This paper focuses on the technologies for creating ...
متن کامل